This post will explore the Tidy tuesday 2020-11-10 dataset. It provides mobile and landline subscription indicators per country from 1990 to 2017.
The data has already been cleaned using this script
(I’ll show the cleaning process without running it -copied from the github link-)
Tidy tuesdays datasets can be downloaded using the tidytuesdayR package.
The dataset come with two objects: a mobile and a landline dataframes.
We will be using the following packages: - tidytuesdayR => to get the data
- countrycode, janitor => not used (used in the cleaning step)
- ggthemes=> for better looking ggplots
- gganimate => to animate a ggplot
- gridExtra => to make subplots
# tidytuesdayR package
if(!require("tidytuesdayR")){install.packages("tidytuesdayR")}
# Either ISO-8601 date or year/week works!
tuesdata <- tidytuesdayR::tt_load('2020-11-10')
##
## Downloading file 1 of 2: `mobile.csv`
## Downloading file 2 of 2: `landline.csv`
mobile <- tuesdata$mobile
landline <- tuesdata$landline
# Other packages
if(!require("countrycode")){install.packages("countrycode")}
if(!require("tidyverse")){install.packages("tidyverse")}
if(!require("janitor")){install.packages("janitor")}
if(!require("ggthemes")){install.packages("ggthemes")}
if(!require("gganimate")){install.packages("gganimate")}
if(!require("gridExtra")){install.packages("gridExtra")}
library(tidyverse)
############### copied and not run from https://github.com/rfordatascience/tidytuesday/blob/master/data/2020/2020-11-10/readme.md
## Shown for information
mobile_df <- raw_mobile %>%
janitor::clean_names() %>%
rename(
total_pop = 4,
"gdp_per_cap" = 6,
"mobile_subs" = 7
) %>%
filter(year >= 1990) %>%
select(-continent) %>%
mutate(continent = countrycode::countrycode(
entity,
origin = "country.name",
destination = "continent"
)) %>%
filter(!is.na(continent))
landline_df <- raw_landline %>%
janitor::clean_names() %>%
rename(
total_pop = 4,
"gdp_per_cap" = 6,
"landline_subs" = 7
) %>%
filter(year >= 1990) %>%
select(-continent) %>%
mutate(continent = countrycode::countrycode(
entity,
origin = "country.name",
destination = "continent"
)) %>%
filter(!is.na(continent))
mobile_df %>%
write_csv("2020/2020-11-10/mobile.csv")
landline_df %>%
write_csv("2020/2020-11-10/landline.csv")
Let’s explore the mobile and landline subscriptions trend per continent .
library(ggplot2)
library(ggthemes)
library(gridExtra)
# Mobile Data summary
mobile_mean_plot<-mobile%>%
group_by(continent, year)%>%
summarise(mobile_subs_mean=mean(mobile_subs, na.rm = T))%>%
{# to pass the dot
# ggplot aes
ggplot(.,aes(year,mobile_subs_mean,color=continent, group=continent))+
# lines
geom_line()+
geom_hline(yintercept = 100, color="gray")+
# aspect
scale_y_continuous(breaks = seq(0, max(.$mobile_subs_mean), 20))+
labs(title = "Mean mobile subscriptions (per 100 persons) per continent")+
theme_hc()
}
# Landline Data summary
landline_mean_plot<- landline%>%
group_by(continent, year)%>%
summarise(landline_subs_mean=mean(landline_subs, na.rm = T))%>%
# ggplot
ggplot(aes(year,landline_subs_mean,color=continent, group=continent))+
# line
geom_line()+
# aspect
labs(title = "Mean landline subscriptions (per 100 persons) per continent")+
theme_hc()
# Plot both graphs
gridExtra::grid.arrange(mobile_mean_plot,landline_mean_plot, ncol=2)
Mobile subscription grew very rapidly in all continents since 1990. Africa was the last continent to start this transition no numeric devices.
However, the rise of African mobile users since 2004 - 2005 is remarkable. In 2000 Africa had 2.54 mobile users/100 while Europe had 40.3 users/100 people (in other words, Europe was, o, average, 15 to 16 times more equipped in mobile devices than Africa). This vast subscription difference became smaller every year, in 2010 Africa already had 55.8 mobile subscription/ 100 people while Europe had 116 (this time Europe was twice as equipped with mobiles than Africa). In 2017, Europe had only 40% more mobiles subcriptions per 100 people than Africa.
It’s also interesting to see that since 2007, Europe citizens had, on average, more than one mobile subscription per person. This upward trend seems to plateau since 2010 for Europe and Americas.
Landline subscriptions followed an interesting path. The growth was not as steady as with mobile devices. For most continents, subscriptions grew until the early 2000s and then it started to fall down. In 2017, Europe is as equipped with landlines as it was in 1990 (35.5 vs 33.5 per 100 people). The downward trend seems to be less noticeable for Africa an Asia.
A lot of countries have missing values for some years. The continent that suffers the most from missing data is Oceania followed by the Americas. In 2015 the percentage of missing values increased for every continent. Overall, Asia seems to be the continent with the more complete data in this dataset.
mobile_missing<-mobile%>%
# Summary
mutate(mobile_is_na= ifelse(is.na(mobile_subs),"NA value","Non NA value"))%>%
group_by(year, continent)%>%
summarise(number_of_rows= n(),
mobile_is_na= sum(mobile_is_na=="NA value"),
mobile_is_na_pct= mobile_is_na/number_of_rows)%>%
# ggplot
ggplot(aes(year, mobile_is_na_pct,color=continent))+
# line
geom_line(position=position_dodge(width = 0.9))+
# aspect
scale_x_continuous(breaks = seq(1990, 2017, 2))+
labs(title = "Percentage of missing mobile subscriptions data")+
theme_hc()
landline_missing<-landline%>%
# Summary data
filter(!year %in% c(2018:2019))%>% # this years don't have landline data
mutate(landline_is_na= ifelse(is.na(landline_subs),"NA value","Non NA value"))%>%
group_by(year, continent)%>%
summarise(number_of_rows= n(),
landline_is_na= sum(landline_is_na=="NA value"),
landline_is_na_pct= landline_is_na/number_of_rows)%>%
# ggglot
ggplot(aes(year, landline_is_na_pct,color=continent))+
# line
geom_line(position=position_dodge(width = 0.9))+
# aspect
scale_x_continuous(breaks = seq(1990, 2017, 2))+
labs(title = "Percentage of missing landline subscriptions data")+
theme_hc()
# Plot both graphs
gridExtra::grid.arrange(mobile_missing,landline_missing, ncol=2)
Let’s calculate growth rates per year for every continent regarding mobile subscriptions. We will use dplyr handy functions like lag and the power of grouping.
# Calculate growt rates per year
rates<-mobile%>%
group_by(year, continent)%>%
summarise(mobile_subs_mean=mean(mobile_subs, na.rm = T))%>%
arrange(continent)%>%
ungroup()%>%
group_by(continent)%>%
mutate(evolution= (mobile_subs_mean-lag(mobile_subs_mean))/lag(mobile_subs_mean))
rates%>%
# ggplot()
ggplot(aes(x=year, y=evolution,fill=continent, color=continent))+
# lines and points
geom_point()+
geom_line(alpha=0.2)+
# aspect
theme_hc()+
scale_x_continuous(breaks = seq(1990, 2017, 2))+
labs("Mobile subscriptions growth rate per continent")
As we saw with the precedent graph, Africa joined the trend tardily but when it did, for several years, the mobile subscription per 100 people more than doubled. The other continents had some ups and downs regarding the growth rate per year. Since 2010, the five continents display a very similar mobile subscriptions growth per year.
We will use gganimate to turn our plot more dynamic.
The growth rate is limited at 250% (there were some outliers present).
library(gganimate)
country_rates<-mobile%>%
group_by(year,continent, entity)%>%
summarise(mobile_subs_mean=mean(mobile_subs, na.rm = T),
pop=total_pop)%>%
arrange(entity)%>%
ungroup()%>%
group_by(entity)%>%
mutate(evolution= (mobile_subs_mean-lag(mobile_subs_mean))/lag(mobile_subs_mean) ,# growth rates
evolution=na_if(evolution, Inf) ,# remove inf values
year=as.integer(year) # so years act as whole numbers in gganimate
)
evol_annimate<-ggplot(country_rates, aes(year, evolution, colour = entity)) +
# points and lines
geom_point(alpha = 1, show.legend = FALSE, aes(size = pop)) +
geom_line(alpha=0.3,size=2)+
# aspect
scale_y_continuous(limits=c(0,2.5))+
scale_x_continuous(breaks = seq(1990, 2017, 2))+
scale_size(range = c(2, 12)) +
theme(legend.position = 'none')+
theme_hc()+
# faceting
facet_wrap(~continent) +
# gganimate
transition_reveal(year) + # to reveal the graph gradually
labs(title = 'Year: {frame_along}', # displays the year
x = 'Year', y = 'Growth rate of mobile subscriptions')
animate(evol_annimate,height = 800, width =800,renderer = gifski_renderer())
# if you want to save it
#anim_save(filename="gif.continents.evol.gif", animation = last_animation(),height = 800, width =800)
Showing shadows instead of lines:
evol_annimate<-ggplot(country_rates, aes(year, evolution, colour = continent)) +
# points and lines
geom_point(alpha = 1, show.legend = FALSE, aes(size = pop)) +
# aspect
scale_y_continuous(limits=c(0,2.5))+
scale_x_continuous(breaks = seq(1990, 2017, 2))+
scale_size(range = c(2, 12)) +
theme(legend.position = 'none')+
theme_hc()+
# faceting
facet_wrap(~continent) +
# gganimate
transition_time(year) + # to reveal the graph gradually
labs(title = 'Year: {frame_time}', # displays the year
x = 'Year', y = 'Growth rate of mobile subscriptions')+
shadow_mark(past=T, alpha = 0.3, size = 0.9)
animate(evol_annimate,height = 800, width =800,renderer = gifski_renderer())